JSM 2023
Toronto, Canada
The University of Utah
2023-06-28
What I highlight in their paper:
Start to finish framework for multi-ERG models.
Dealing with heterogeneous samples.
Model building process.
Goodness-of-fit analyses.
Two important missing pieces (for the next paper): power analysis and how to deal with collinearity in small networks.
Two different questions: How many nodes? and “How many networks?”
Is the network bounded?
If it is bounded, can we collect all the nodes?
If we cannot collect all the nodes, can we do inference (Schweinberger, Krivitsky, and Butts 2017; Schweinberger et al. 2020)?
There is a growing number of studies featuring multiple networks (e.g., egocentric studies).
There’s no clear way to do power analysis in ERGMs.
In funding justification, power analysis is fundamental, so we need that.
We can leverage conditional ERG models for power analysis.
Conditioning on one sufficient statistic results in a distribution invariant to the associated parameter, formally:
\[\begin{align} \notag% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{Y}= \boldsymbol{y}\left|\;\boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right.\right)}% & = \frac{% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right)_{-l} = \boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}, \boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right) } }{% \sum_{\boldsymbol{y}'\in\mathcal{Y}:\boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}{\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right) = \boldsymbol{y}'\right) }% } \\ & = % \frac{% \mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\right\} }{% \kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} }, \tag{1} \end{align}\]
where \(\boldsymbol{g}\left(\boldsymbol{y}\right)_l\) and \(\boldsymbol{\theta}_l\) are the \(l\)-th element of \(\boldsymbol{g}\left(\boldsymbol{y}\right)\) and \(\boldsymbol{\theta}\) respectively, \(\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\) and \(\boldsymbol{\theta}_{-l}\) are their complement, and \(\kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} = \sum_{\boldsymbol{y}' \in \mathcal{Y}: \boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}\mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}'\right)_{-l}\right\}\) is the normalizing constant.
We can use this to generate networks with a prescribed density (based on previous studies) and compute power through simulation.
Study gender homophily in networks of size 8.
On average, the focal networks have 20 ties (, a density of \((2\times 20)/(8 \times 7) \approx 0.71\)).
Want to detect an effect size of \(\boldsymbol{\theta}_{\mbox{homophily}} = 2\), we could approximate the required sample size in the following fashion:
With Eq. (1), use MCMC to simulate \(1,000\) sets of \(n\) networks of size 8 and 20 ties.
For each set, fit a conditional ERGM to estimate \(\widehat{\boldsymbol{\theta}}_{\mbox{homophily}}\), and generate the indicator variable \(p_{n, i}\) equal to one if the estimate is significant at the 95% level.
The empirical power for \(n\) is equal to \(p_n \equiv \frac{1}{1,000}\sum_{i}p_{n, i}\).
Once we have computed the sequence \(\{p_{10}, p_{20}, \dots\}\), we can fit a linear model to estimate the sample size as a function of the power, , \(n = \beta_0 + \beta_1 p_n + \beta_2 p_n^2 + \varepsilon\).
With the previous model in hand, we can estimate the sample size required to detect a given effect size with a given power.
Variance Inflation Factor [VIF] is a common measure of collinearity in regular models.
Usually, VIF > 10 is considered problematic.
Duxbury (2021)’s large simulation study recommends using VIF between 150 and 200 as a threshold for multicollinearity.
In small networks, this could be more severe.
A few questions:
How would you address power analysis in ERGMs?
VIFs and correlations across statistics are significantly high in dense networks. How much do you think it matters? If it matters, how would you address it?
Relating both, is there any way in which a large sample size can help with collinearity?
In KCH, effect sizes are significantly large.
How much heterogeneity? Networks in KCH range from two to eight, but how about bigger samples? In Schweinberger, Krivitsky, and Butts (2017) it is mentioned the term “comparative”, but there’s no clear definition of what that means.
george.vegayon at utah.edu
Vega Yon et al – ggv.cl/slides/jsm2023 – The University of Utah